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Abstract. Three different approaches to derive mutual information via thermo- 
dynamics are presented where the temperature-dependent energy is given by: (a) 
(38 = -ln[P(X,Y)], (b) (38 = -\n[P(Y\X)} or (c) (38 = -hx[P(X\Y)]. All ap- 
proaches require the extension of the traditional physical framework and the modifi- 
cation of the 2nd law of thermodynamics. A realization of a physical system with an 
effective temperature-dependent Hamiltonian is discussed followed by a suggestion of 
a physical information-heat engine. 
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Figure 1. A schematic communication channel. 



1. Introduction 

The generic problem in information processing is the transmission of information over a 
noisy communication channel [1-3]. The transmission can be mathematically described 
by two random variables X and Y representing the desired information and its noisy 
replica, respectively. A schematic figure of a communication channel is depicted in Fig. 
[U • The basic properties of a communication system are: P(X) which is the probability 
of transmitting a symbol X taken from the input alphabet, and P(Y\X) which stands 
for the probability of receiving a symbol Y (taken from the output alphabet) following 
the transmission of a symbol X. Noisy transmission can occur either via space from one 
geographical point to another, as happens in communications, or in time, for example, 
when sequentially writing and reading files from a hard disk in the computer. 

Mutual information, I(X;Y), is a principle quantity in information theory which 
quantifies the amount of information in common between two random variables. It is 
used to upper bound the attainable rate of information transferred across a channel. A 
basic definition of the mutual information is 



where H(-) is the Shannon's information entropy (in Nats) [4]. The mutual information 
measures the amount of uncertainty in a random variable, indicating how easily data 
can be losslessly compressed. Hence knowing Y, we can save an average of I(X; Y) bits 
in encoding X compared to not knowing Y [1,5]. 

A fundamental link between information theory and thermodynamics was first 
established five decades ago by Jaynes [6] . However, his work did not include an explicit 
relation between mutual information and thermodynamics. 

Recently, it has been proven [7, 8] that the mutual information can be reformulated 
as a consequence of the laws of thermodynamic, where the corollary was exemplified 
for the Gaussian noisy channel and for the binary symmetric channel. The modeling 
of the communication channels as a thermal system required the generalization of 
thermodynamics to include T-dependent Hamiltonians and the generalization of the 
second law of thermodynamic which was proved to have the following form 



I(X; Y) = H(X) - H(X\Y) = H{Y) - H(Y\X), 



(1) 




(2) 
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where (•) denotes averaging over the standard Boltzmann distribution. 

In communication channel the goal is to estimate the transmitted symbol X from 
the received symbol Y (Fig. [1]), hence the main quantity of interest is P(X\Y). A 
physical system with equivalent properties as the communication channel has to obey 
the following 

P W y ) = ^™. ( 3 ) 



Z y 1 ' P(Y) 
This new bridge between mutual information and thermodynamics requires the 
extension of the traditional physical framework and the following two questions are at 
the center of the first part of our work. The first one is whether the mapping between 
mutual information and thermodynamics as well as the physical energy governing a given 
communication channel is uniquely defined. In case the energy function is not uniquely 
defined, the question is whether the required extension of the physical framework is a 
necessary ingredient, or there is a physical way to express the mutual information using 
the traditional physical framework without altering the second law of thermodynamics. 

The answers to the above questions are that the mapping between mutual 
information and thermodynamics as well as the physical energy governing a given 
communication channel is not uniquely defined and requires the extension of the physical 
framework. In the following we present three primary approaches, followed by a 
discussion of a possible physical system with an effective T-dependent Hamiltonian and 
a possible realization of an information-heat engine. Details of the derivations are left for 



Appendix A , whereas Appendix B exemplifies the calculation of the mutual information 



of a few archetypal communication channels via thermodynamics. 



2. 1st approach - Boltzmann factor oc P(X,Y) 

This approach takes the joint probability P(X, Y) to be the Boltzmann factor and 
defines a T-dependent energy, 

£ = -±\n[P{X,Y)\. (4) 

This form of the energy is a naive physical energy definition, since it adequately 
describes a physical system consisting of two degrees of freedom, X and Y, in a contact 
with a macroscopic heat reservoir. At equilibrium the expectation properties of X and 
Y are determined following the partition function Z = ^ f exp(— [9)10]- Using the 
T-dependent Hamiltonian (j3j), to describe a communication channel (e.g. [7], Eq. 44) 
enforces the generalization of the second law of thermodynamics (|2j), and the mutual 
information takes the following form [7, 8] 

I(X- Y) = -Ey-j, [jU \]=J ~ [{u + 7 (JLsj} rf 7 } (5) 

where Ey ;/3 {-} denotes expectation of the random object within the bracket with respect 
to the subscript random variable Y, and for a given temperature j3. 
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3. 2nd approach - Boltzmann factor oc P(Y\X) 

This approach refers to P(Y\X) as the Boltzmann factor [11] and the resulted energy is 
£ = ~\n[P(Y\X)\. (6) 

The definition of the energy, ([6]), is based on the interpretation of the prior 
probability of the inputs of the channel, P{X), as the degeneracy of the energy level 
£ [11]. This approach was recently adopted also by [12]. It depicts a scenario of 
communication channels where the output, Y, is estimated by the input, X. Since 
the degeneracy of the input of the channel can be designed arbitrary, the degeneracy 
of the physical energy function (jHJ) may decrease while the energy increases, in contrast 
to physical systems. The emission of heat to the reservoir decreases the energy and 
increases the entropy. Hence, both terms of the free energy identity, F = U — TS, 
decrease and the system is unstable to thermal fluctuations. This situation demands a 
modification of both the free energy and the second law of thermodynamics (I8f9]l . 

The mutual information for energy (jHJ) is given by 

I(X-Y) = -E Y;P {P(U-F)}. (7) 

where the free energy (as implicitly suggested in [8]) and the second law are effectively 
modified to be 

F = U + TD kl {P{X\Y)\\P{X)) (8) 

and 

dQ = -TdD kl (P(X\Y)\\P(X)), (9) 

respectively. For a clarification, F is the free energy, Dm(-) denotes the Kullback- 
Leibler divergence [1] and the Boltzmann constant is arbitrarily taken to be unity. The 



derivation of ( l7|8|9l) is detailed in Appendix A Note that when P(X) is uniformly 



distributed, the conventional identity of the free energy, F = U — TS, and the second 
thermodynamic law, dQ = TdS, are restored. 

4. 3rd approach - Boltzmann factor oc P(X\Y) 

We propose a new approach, where we define the Boltzmann factor to be P(X\Y). As 
a result, the T-dependent energy is 

£ = -^HP{X\Y)\. (10) 

The 3rd approach describes a typical communication system where the input, X, is 
estimated by the output, Y. Nevertheless, in this kind of energy functions, the partition 
function is normalized to Z = 1, independent of the temperature. Yet also this approach 
requires the generalization of the second law (j2j), however, the mutual information has 
a simple form of the internal energy only 



I(X;Y) = -E Y .^{ 1 U\^} 



11) 
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Table 1. A comparison between the three approaches to connect the mutual 
information via thermodynamics. Each approach requires the extension of the 
traditional physical framework and yields modified definitions for the free energy 
and/or for the second law of thermodynamics. 



Note that the proposed approach encompasses the other two approaches [7,8,11]. 
On the one hand, the mutual information ( TTTT) can easily be deduced from the 2nd 
approach using F = in ((Tj). On the other hand, the energy function (fit)]) explicitly 
indicates that the second term of Eq. (jSJ) is identically zero. A comprehensive derivation 



of the 3rd approach is exhibited in Appendix A 



A synopsis of a comparison between the three approaches is depicted in Table [TJ 



5. Physical information-heat engine 

The extension of the physical framework to include T-dependent Hamiltonians and 
the generalized second law of thermodynamics might also refresh our viewpoint on 
traditional physical systems. 

A prototypical physical system governed by an effective T-dependent Hamiltonian 
is a spring where the spring constant is a function of the temperature, K = K(T) [13]. 
The energy of the spring is 

£ = \k{T)z 2 (12) 

where z denotes the extension of the spring from a reference position with the lack 
of force on the spring. Note that the common scenario is that the free energy is an 
explicit function of the temperature. However, in our case, the (effective) Hamiltonian 
is a function of the temperature. This dependence calls for an explanation, since the 
fundamental potentials (gravitation, electromagnetic etc.) governing the known physical 
laws are independent of the temperature. The solution of this mystery, a T-dependent 
Hamiltonian, is that the spring is represented by one macroscopic degree of freedom and 
its property is a consequential of a coarse grained over the microscopic many degrees of 
freedoms and the nonlinear forces among them. 

A mass, M, is connected to one end of the spring and the spring with the connected 
mass is hanged in a container which is vacuumed (Fig. |2J). The container is connected 
to a heat reservoir at a temperature T and for the simplicity of the following discussion 
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Figure 2. A spring, with a temperature dependent spring constant, K(T), connected 
to a mass M hangs in a container which is vacuumed. 




Figure 3. A container in a thermal contact with a heat-reservoir at high temperature, 
Th, (left panel) and for a heat-reservoir at cold temperature, Tc, (right panel). We 
assume that the spring constant monotonically decreases with the temperature, hence 
the extension of the spring at Th is greater than for Tc- 




S 

Figure 4. A Carnot cycle acting as a heat engine, illustrated on a temperature- 
entropy diagram. The vertical axis is temperature, the horizontal axis is entropy. The 
cycle takes place between a hot reservoir at temperature Tjj and a cold reservoir at 
temperature Tc- 



we assume that the spring constant monotonically decreases with the temperature. The 
equilibrium situation at two different temperatures, T H > T c is depicted in Fig. [3j 
We turn now to describe a possible information-heat engine based on such T-dependent 
Hamiltonian. 

A Carnot cycle acting as a heat engine, is illustrated by the black-cycle in the 
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temperature-entropy diagram in Fig. HI The Carnot cycle consists of 4-steps, alternating 
isothermal and adiabatic processes [9,10]. The cycle of the information-heat engine 
consists of two steps only and is illustrated by the red lines in Fig. HI The first step, from 
D to B, describes a quasi-static process where the temperature increases from Tc to T H 
and both the temperature and the entropy increase, since the Hamiltonian is an explicit 
function of the temperature. In the reversed process, from B to D, the temperature 
decreases in a quasi-static manner back to Tc and the cycle is completed. No work is 
done in the entire cycle, D-B-D, since the container is vacuumed, and mathematically 
the area formed by the cycle D-B-D in the (S, T) plane is zero. 

The heat absorbed/emmited by the C/H reservoirs is responsible for the following 
two main changes of the system (spring+mass) placed in the container: (a) The kinetic 
energy of the microscopic degrees of freedom is modified, (b) The Hamiltonian of the 
spring is modified via the T-dependent spring constant. This process was named as 
"channel work" in [8,14], since it reduces the effective heat contributing to the change 
in the entropy and it resembles work. However, no actual work is done. Note that in 
principle at equilibrium the macroscopic mass, M, oscillates as an harmonic oscillator 
too, since each degree of freedom has on the average a kinetic energy equals to KbT/2, 
however, these microscopic vibrations are neglected. 

The information- heat engine depicted in Fig. H] describes a way to generate bits in 
a way resembling a traditional heat engine, but with the lack of work. The height of 
the mass M represents the generated bit: the cold position represents "0" whereas the 
hot position represents "1". A generation of a sequence of bits can be done by using a 
predetermined protocol indicating the frequency (bandwidth) for the generation of bits. 
For instance, in the event that the current bit is "0" and the successor bit is "0" too, 
the contact to the cold reservoir remains, but in case of a successor "1", the container 
is brought to a contact with the hot reservoir. 

The proposed information-heat engine describes a way to generate the information, 
a sequence of bits, in a 2-steps cycle and with the lack of work. The generation of 
the communication channel requires a fundamental physical mechanism to transmit the 
bits and with minimal work in order to enhance the efficiency of the process . All such 
mechanisms have to "read" and to estimate the height of the mass in the container. 
Note that the framework of noisy communication channel enables a distortion of the 
information, however, the encoder represents a noise-free process where the noise is 
added during the transmission only. Hence, the information-heat engine has a lack of 
inherent noise. There are many possible mechanisms to estimate the height of the mass 
using, for instance, reflected/transmitted photons from the mass/lack-of-mass at a given 
height, however, it is beyond the scope of our work. 

In the above, we presented a possible mechanism which resembles the Carnot 
engine, but with the lack of work. There are many alternative physical ways to 
generate an information-heat engine. For instance, using a material which undergoes 
a ferromagnetic/paramagnetic transition in between Tq/Th- However, the essence of 
such an information-heat engine is a T-dependent Hamiltonian. 
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Appendix A. Derivations of the 2nd and 3rd approaches 

Appendix A.l. Derivation of the 2nd approach 

The minimal mutual information for a given expected distortion, Kx t y{d{Y, X)}, can 
be found by minimizing the functional J 7 (P(X\Y)) = I(X; Y) + (3 ¥,x,v{d{Y, X)} over 
all normalized distributions P(X\Y) [11]. The solution of the variational problem is the 
normalized probability 

P(X\Y) = 0^f mX \ (A.l) 

where \nZ(Y,(3) = \(Y) and (3 are the Lagrange multipliers of the normalization 

and the expected distortion constraints, respectively. Moreover, (3 is positive and 
satisfies [11] 

g- SI ^ Y ) (A2) 
P 6E XtY {d(Y,X)Y { ■ ) 

In order to satisfy the energy definition ([6]) and using the Bayes' law, a comparison of 
flA.lj) with the Boltzmann distribution law yields the following mapping: d(Y, X) — > S, 
(3 — > 1/T , P(X) is used as the degeneracy of the energy level d(Y, X) and Z(Y, f3) is 
the partition function for a given Y . 

The internal energy of the system, U, is the expectation value of the energy, d(Y, X). 
By equating the Lagrange multiplier, (3 ( 1A.2I) to the second law of thermodynamics, 
(3 — ^jj, it is easy to see that this system obeys the following mapping 

S^-I. (A.3) 

A verification of fl A. 3j) can be observed using the relation 

F = TT = U + TI, (A.4) 

followed by comparing the free energy identity, F = U — TS, to T (See eq. 12 in [11]). 

Substituting I = Ey ;/3 {D kl (P(X\Y)\\P(X))} [11] into Eqs. (EO} [Ql) and based 
on the first law of thermodynamics with the lack of work, dQ = dU, we obtain Eqs. 

(EE). 



Appendix A. 2. Derivation of the 3rd approach 

The definitions of the marginal and conditional entropies, consisting the mutual informa- 
tion (P), are H(X) = -T,x P ( X ) InP(X) and H(X\Y) = - J2x,Y P ( X i Y ) lnP(X|F), 
respectively. Note that when X and Y are independent random variables, H(X\Y) 
becomes H(X). 

We introduce a new variable (3 which represents the noise in the channel and has 
the following properties 

P(X,Y;(3 = 0) = P(X)P(Y;(3 = 0) 

P(X\Y;f3 = 0) = P(X), (A.5) 



Mutual information via thermodynamics: Three different approaches 



9 



where P(Y; (3) is the probability of receiving Y for a given noise (3. As a result, the 
conditional entropy becomes an explicit function of (3. Using Bayes' law and defining 

S(X\Y;(3) = -J2P(X\Y;P)\nP(X\Y;(3), (A.6) 

x 

we can write the entropies as, 

H{X) = S{X\Y-(3 = 0) (A.7) 
H(X\Y; (3) = E y;/3 {S(X\Y; /?)} . (A.8) 

Note that a noiseless channel is represented by the limit (3 —>■ oo. Taking into account 
that H(X) is independent of Y, we can write, 

H(X) = E Y]0 {S(X\Y;P = O)}. (A.9) 

Substituting Eqs. (1A.8IIA.9I) into Eq. flTJ), we achieve a new form of the mutual 
information 

/(X; Y) = -E Y;P {S(X\Y; 7 ) |^ } . (A.10) 



Following (TlOT) . it is clear that the partition function of the equivalent 
thermodynamic system is Z — 1, hence Eq. flA.6j) is 

S[X\Y- i) = -J2 ei ^ £) (-70 = lU, (A.11) 

x 

where U is the thermodynamic average of S, divided by the partition function, or in 
other words, the internal energy. The free energy obeys F = — ^\nZ = U — ^ = 0, 
hence S = (3U. Substituting (1A.11I) into (lA.lOj) . we finally receive a much simpler 
thermodynamic form of the mutual information, as the difference of the internal energies 
of the system, 

I{X-Y) = -E Y , P { 1 U\ 1 =J ) ). (A.12) 
Appendix B. Applications of the 3rd approach 



The new description of the mutual information fTlT| IA.12I) is exemplified over several 



archetypal communication channels. The sketch of the calculations for the Gaussian 
channel with Gaussian input, the Gaussian channel with Bernoulli-1/2 input and finally 
the binary symmetric channel with a biased input (Biased BSC) are presented. For the 
examples we shall use the following notations: P(X) = P(X = x), P(Y) = P(Y = y). 

Appendix B.l. Gaussian channel with Af(0, 1) input 
The input and the a-posteriori probabilities of this channel are 
P(X) = A/-(0, 1) 

P(Y\X)=AT(0^). (B.l) 
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Hence, the energy, according to Bayes' law and (fTOl is 

2 \ /3 J 2/? 2Vl+/?/ 2/? 1 ; 

Using (1A.12I) one can easily find the formula for the Shannon capacity [4], 

I(X;Y) = Un(l + P), (B.3) 

which is identical to the mutual information derived from the 1st approach, eqs. ([4f5|) [8]. 

Appendix B.2. Gaussian channel with Bernoulli-1/2 input 

This case is characterized by equiprobable binary inputs a-posteriori probabilities as 
following 

P{X = 1) = P{X = -1) = 1/2 

P(Y\X)=Af(0,±). (B.4) 

Implementing the Baye's law, while dropping the elements which are independent of x, 
yields a simple expression for the energy (fTOl) . 

g^- '^coshfaffl (B5) 
which eventually gives us the known Shannon-theoretic result [2], 

/(X; Y) = P-^J ex P {^f) log cosh (P ~ VPv) dy. (B.6) 



Appendix B.3. Biased binary symmetric channel 

In this case the prior distribution of the biased input and the probability for a symbol 
to flip during the transmission are denoted as 

P(X = -l)=p 

P{X = \) = l-p (B.7) 

and 

P(Y = ±1\X = =Fl) = 6 (B.8) 
respectively. Hence, the probabilities are defined as following, 



l + x 



p(X) =p— (1 -p)~ xe {-1,1} 

P(Y\X) = 6^ (1 
The energy (TlOT) is now given by 



P(Y\X) = 5^ (1 - Sf** {-1,1}. (B.9) 



xy ln((l-^) x^-f ln((l-p)p) 

2 2/3 2/3 W 

" ' ln(l -j9-5 + 2p(5) + ( j ln(p + 5-2p5). (B.10) 
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where the inverse temperature, /3, is defined as 




(B.ll) 



Applying (IA. 1 2[) . we finally receive the mutual information, 

/(X; Y) = 5 ln(5) + (1 - 5) In (1 - 5) 

-(p + 8- 25p) \n(p + 5- 25p) 
-(l-p-5 + 2Sp) ln(l - p - 5 + 25p). 



(B.12) 



For p = \ the mutual information for the BSC is restored [14] 



/(X; Y) = 5 ln(5) + (1 - 5) ln(l - 5) + In 2. 



Note that (3 is a function of the noise (5), solely, i.e. it is not effected by the nature 
of the input, P(X). 
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